This report compares the physical characteristics and quality score of almost 5,000 white wine samples.
## [1] 4898 13
## X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1 7.0 0.27 0.36 20.7 0.045
## 2 2 6.3 0.30 0.34 1.6 0.049
## 3 3 8.1 0.28 0.40 6.9 0.050
## 4 4 7.2 0.23 0.32 8.5 0.058
## 5 5 7.2 0.23 0.32 8.5 0.058
## 6 6 8.1 0.28 0.40 6.9 0.050
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 45 170 1.0010 3.00 0.45 8.8
## 2 14 132 0.9940 3.30 0.49 9.5
## 3 30 97 0.9951 3.26 0.44 10.1
## 4 47 186 0.9956 3.19 0.40 9.9
## 5 47 186 0.9956 3.19 0.40 9.9
## 6 30 97 0.9951 3.26 0.44 10.1
## quality
## 1 6
## 2 6
## 3 6
## 4 6
## 5 6
## 6 6
## 'data.frame': 4898 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## [1] 0
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality
## Min. : 8.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
## 'data.frame': 4898 obs. of 13 variables:
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## $ quality.factor : Ord.factor w/ 7 levels "3"<"4"<"5"<"6"<..: 4 4 4 4 4 4 4 4 4 4 ...
The data includes a quality score and 11 physcial characteristics of 4,898 white wine samples. There are no “NA” values. The variable “X” was removed. While the quality range allowed is 0 - 10, the actual range of quality is 3 - 9.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
## x freq
## 1 FALSE 4878
## 2 TRUE 20
## x freq
## 1 FALSE 4893
## 2 TRUE 5
The quality distribution appears to be normally distributed from 3 - 9 with about 500 more quality scores of 5 than 7. And 15 more quality scores of 3 than 9.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
Both fixed.acidity and volatile.acidity are right skewed each with extreme outliers. Do fixed.acidity and volatile.acidity correlate?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
The variable citric.acid appears to be normally distributed with extreme outliers slightly below 1.25 and 1.75. Also, there is a peak near 0.5. Why is there a peak at 0.5?
## [1] "1.2"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
The majority of wines include about 1 gram/liter, with the mode being 1.2. Because of the right skew in the histogram, the data was transformed using log10 scale. This transformation revealed a bimodal distribution.
According the information provided with the data, “it’s rare to find wines with less than 1 gram/liter [of sugar] and wines with greater than 45 grams/liter [of sugar] are considered sweet.”
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
With outliers removed, the chloride distribution appears normal.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
The variable “free.sulfur.dioxide” is a subset of “total.sulfur.dioxide,” therefore total.sulfur.dioxide’s mean and median are higher than free.sulfur.dioxide’s. According to the information provided with the data, “at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine.” As SO2 levels increase and become evident, does the quality of the wine decrease?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
Almost all density values are within a range of .005. With outliers removed, a slightly right skewed distribution is revealed. The distribution is skewed toward 1.000, the density of water (https://water.usgs.gov/edu/density.html).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
pH values are normally distributed. pH values increase in acidity as they approach 0 (https://water.usgs.gov/edu/ph.html). According to the information provided with the data, “most wines are between 3-4 on the pH scale.”
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
## [1] "0.5"
According to the information provided with the data, sulphates are “a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich (sic) acts as an antimicrobial and antioxidant.” Is this additive added at a standard ammount leading to the mode of 0.5?
## [1] "9.4"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
Alcohol levels are skewed right with a mode of 9.4. In order to address the skew in the histogram, the alcohol data was transformed using a log10 transoformation. Is there a correlation between increased alcohol and increased quality?
## int [1:77] 6 4 6 6 4 6 5 7 5 5 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 5.000 5.377 6.000 8.000
As stated earlier, “it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet.” The subset to examines the quality of wines with less than 1 gram/liter of sugar. This subset of 77 white wines skews toward lower quality.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6 6 6 6 6 6
As stated earlier, “it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet.” The subset to examines the quality of wines with more than 45 gram/liter of sugar. There is only one wine with 45 grams/liter of residual.sugar. It’s quality score of 6 is above the entire sample size’s mean of 5.878.
## int [1:1229] 6 6 5 6 5 5 5 6 6 6 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.739 6.000 9.000
The subset examines the quality of wines with more than 9.9 gram/liter of sugar, which coincides with the third quartile of the entire white_wine sample. Similar to white wines with low amounts of sugar, this subset skews toward lower quality.
The mean quality for the entire sample of white wine is 5.878. The mean quality for wines with residual sugar of less than 1 is 5.377. The mean quality for wines with residual sugar greater than or equal to 45 is 6, but since there is only one wine with “high” residual sugar this result does not provide significant insight. The mean quality for wines with residual sugar greater than or equal to 9.9, the third quartile of the entire sample, is 5.739.
## int [1:1256] 5 7 7 8 8 6 7 5 6 7 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 6.000 6.000 6.423 7.000 9.000
## int [1:2555] 6 6 6 6 6 6 6 6 5 5 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.791 6.000 9.000
The mean quality for the entire sample of white wine is 5.878. The mean quality for wines with alcohol of greater than 11.4 is 6.423. The mean quality for wines with alcohol content within the interquartile range, i.e. 9.50 and 11.40, is 5.791.
There are 4,898 white wines in the dataset with 12 features (fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol, quality). Further information, including units, is below:
Output variable (based on sensory data): - quality (score between 0 and 10)
Other observations:
The data set contains no ‘NA’ values
Quality is normally distributed. The range of quality is 3-9.
Most other variables are also normally distributed, with two notable exceptions: residual sugar and alcohol.
Residual sugar’s range is from 0.600 to 65.800 with a mean of 6.391 and mode of 1.2, which created a right skewed histogram.
Alcohol’s range is from 8 to 14.20 with a mean of 10.51 and mode of 9.4, which created a right skewed histogram.
The main feature of interest in the data set is quality. The analysis will seek to determine which features hihgly correlate with the quality of a white wine.
While fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, and alcohol will likely all impact the quality of white wine, I suspect alcohol and residual sugar will impact the quality of white wine more than the other variables.
No new variables were created.
Did you perform any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?
The variable “X” was removed because the unique identifiers will not be utilized.
A log10 transformation was used on the right skewed residual.sugar and alcohol distributions. The resulting transformation of residual sugar appears bimoal with sugar content peaking just above 1 and just below 10. The resulting transformation of alcohol appears normal.
In several plots the x-axis was limited in order to remove outliers and establish a better understanding of the bulk of the data. Also, bin widths were adjusted to align with the significant figures for each variable.
## fixed.acidity volatile.acidity citric.acid
## fixed.acidity 1.000 -0.023 0.289
## volatile.acidity -0.023 1.000 -0.149
## citric.acid 0.289 -0.149 1.000
## residual.sugar 0.089 0.064 0.094
## chlorides 0.023 0.071 0.114
## free.sulfur.dioxide -0.049 -0.097 0.094
## total.sulfur.dioxide 0.091 0.089 0.121
## density 0.265 0.027 0.150
## pH -0.426 -0.032 -0.164
## sulphates -0.017 -0.036 0.062
## alcohol -0.121 0.068 -0.076
## quality -0.114 -0.195 -0.009
## residual.sugar chlorides free.sulfur.dioxide
## fixed.acidity 0.089 0.023 -0.049
## volatile.acidity 0.064 0.071 -0.097
## citric.acid 0.094 0.114 0.094
## residual.sugar 1.000 0.089 0.299
## chlorides 0.089 1.000 0.101
## free.sulfur.dioxide 0.299 0.101 1.000
## total.sulfur.dioxide 0.401 0.199 0.616
## density 0.839 0.257 0.294
## pH -0.194 -0.090 -0.001
## sulphates -0.027 0.017 0.059
## alcohol -0.451 -0.360 -0.250
## quality -0.098 -0.210 0.008
## total.sulfur.dioxide density pH sulphates alcohol
## fixed.acidity 0.091 0.265 -0.426 -0.017 -0.121
## volatile.acidity 0.089 0.027 -0.032 -0.036 0.068
## citric.acid 0.121 0.150 -0.164 0.062 -0.076
## residual.sugar 0.401 0.839 -0.194 -0.027 -0.451
## chlorides 0.199 0.257 -0.090 0.017 -0.360
## free.sulfur.dioxide 0.616 0.294 -0.001 0.059 -0.250
## total.sulfur.dioxide 1.000 0.530 0.002 0.135 -0.449
## density 0.530 1.000 -0.094 0.074 -0.780
## pH 0.002 -0.094 1.000 0.156 0.121
## sulphates 0.135 0.074 0.156 1.000 -0.017
## alcohol -0.449 -0.780 0.121 -0.017 1.000
## quality -0.175 -0.307 0.099 0.054 0.436
## quality
## fixed.acidity -0.114
## volatile.acidity -0.195
## citric.acid -0.009
## residual.sugar -0.098
## chlorides -0.210
## free.sulfur.dioxide 0.008
## total.sulfur.dioxide -0.175
## density -0.307
## pH 0.099
## sulphates 0.054
## alcohol 0.436
## quality 1.000
## [,1]
## fixed.acidity -0.114
## volatile.acidity -0.195
## citric.acid -0.009
## residual.sugar -0.098
## chlorides -0.210
## free.sulfur.dioxide 0.008
## total.sulfur.dioxide -0.175
## density -0.307
## pH 0.099
## sulphates 0.054
## alcohol 0.436
## quality 1.000
The correlation betwwen all variables and ‘quality’.
## [,1]
## fixed.acidity 0.023
## volatile.acidity 0.071
## citric.acid 0.114
## residual.sugar 0.089
## chlorides 1.000
## free.sulfur.dioxide 0.101
## total.sulfur.dioxide 0.199
## density 0.257
## pH -0.090
## sulphates 0.017
## alcohol -0.360
## quality -0.210
The correlation betwwen all variables and ‘chlorides’.
## [,1]
## fixed.acidity 0.265
## volatile.acidity 0.027
## citric.acid 0.150
## residual.sugar 0.839
## chlorides 0.257
## free.sulfur.dioxide 0.294
## total.sulfur.dioxide 0.530
## density 1.000
## pH -0.094
## sulphates 0.074
## alcohol -0.780
## quality -0.307
The correlation between all variables and ‘density’.
## [,1]
## fixed.acidity -0.121
## volatile.acidity 0.068
## citric.acid -0.076
## residual.sugar -0.451
## chlorides -0.360
## free.sulfur.dioxide -0.250
## total.sulfur.dioxide -0.449
## density -0.780
## pH 0.121
## sulphates -0.017
## alcohol 1.000
## quality 0.436
The correlation betwwen all variables and ‘alcohol’.
# http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs
pairs.panels(subset(white_wine, select=-c(quality.factor)))
# http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs
pairs.panels(subset(white_wine, select=c(residual.sugar, chlorides,
total.sulfur.dioxide, density, alcohol,
quality)))
Quality is the variable of focus. Therefore, correlations with quality are highlighted. The strongest correlations exist between quality and the following: - chlorides (-0.210) - density (-0.307) - alcohol (0.436)
Because alcohol has the strongest correlation to quality, the relationship between alcohol and all other variables is considered. The strongest correlations exist between alcohol and the following: - density (-0.780) - residual.sugar (-0.451) - total sulfur dioxide (-0.449)
Between all variables, the strongest correlations exist between the following: - density and residual sugar (0.839) - density and alcohol (-0.780) - density and total sulfur dioxide (0.530)
As quality increases, the concentration of chlorides decreases.
Acccording to the boxplot above, no inference can be made about the correlation between quality and density.
Overall, quality increases as alcohol concentration increases. There is a decrease in alcohol concentration as quality increases from 3 to 5.
As the concentration of alcohol increases, density decreases.
As the concentration of alcohol increases, residual sugar decreases especially from an alcohol concentration of 8-10%.
As the concentration of alcohol increases, total sulfur dioxide decreases.
As density increases, residual sugar increases.
As density increases, alcohol concentration decreases.
As density increases, total sulfur dioxide increases.
## residual.sugar density alcohol total.sulfur.dioxide
## 2782 65.8 1.03898 11.7 160
In the plots above, there is an outlier at a density just below 1.04. This is a result of the high concentration of residual sugar in that wine.
in the dataset?
Quality correlates most strongly with chlorides, density, and alcohol.
As quality increases, the concentration of chlorides decreases.
As quality increases, density decreases.
As quality increases, the concentration of alcohol increases. Except from a quality change from 3 - 5 where alcohol concentration decrease as quality increases.
Because alcohol has the strongest correlation to quality, the relationship between alcohol and all other variables was considered. The strongest correlations exist between alcohol and the following: density, residual sugar, total sulfur dioxide.
As the concentration of alcohol increases, density decreases. “Alcohol, or ethanol, is the intoxicating agent found in beer, wine and liquor.” https://www.drugs.com/alcohol.html The density of ethanol is 0.7893 (https://pubchem.ncbi.nlm.nih.gov/compound/ethanol#section=Density). Therefore, as the concentration of alcohol increases, the density of the wine decreases.
As the concentration of alcohol increases, residual sugar decreases especially from an alcohol concentration of 8-10%. This is a by product of alcohol production. “[W]hen winemaking happens, yeast eats sugar and makes ethanol (alcohol) as a by-product.” (https://winefolly.com/review/sugar-in-wine-chart/) The sugar leftover after this process is called “residual sugar.” (https://bit.ly/2K2nFv1) Therfore as the concentration of alcohol increases, the amount of sugar “eaten” by yeast increases, thus the amount residual sugar in the wine decreases and the percentage of alcohol increases simultaneously.
As the concentration of alcohol increases, total sulfur dioxide decreases. “Sulfur dioxide (SO2) is important in the winemaking process as it aids in preventing microbial growth[…].” https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472855/ “[Alcohol] acts synergistically and enhances the bacteria-killing effect of molecular SO2 [sulfur dioxide], so high-alcohol wines require less SO2 protection” (https://www.extension.purdue.edu/extmedia/fs/fs-52-w.pdf). Therefore, as alcohol concentration increases, the amount of total sulfur dioxide decreases.
The strongest correlations between any variables were condsidered. The variables were density and each of the following: residual sugar, alcohol, and total sulfur dioxide.
As density increases, residual sugar increases. This correlation occurs becuase “[t]he more sugar that’s mixed into a measured amount of water, the higher the density of the mixture.” (https://www.stevespanglerscience.com/lab/experiments/sugar-rainbow/)
As density increases, alcohol concentration decreases. “Alcohol, or ethanol, is the intoxicating agent found in beer, wine and liquor.” (https://www.drugs.com/alcohol.html) The density of ethanol is 0.7893 (https://pubchem.ncbi.nlm.nih.gov/compound/ethanol#section=Density). Therefore, as the concentration of alcohol increases, the density of the wine increases.
As density increases, total sulfur dioxide increases. The density of sulfur dioxide is 1.434. (https://pubchem.ncbi.nlm.nih.gov/compound/sulfur_dioxide#section=Density) Therefore as the concentration of sulfur dioxide increases, the density of wine increases.
As density decreases and alcohol increases, the quality scores appear to increase.
As residual sugar decreases and alcohol increases, the quality scores appear to
increase.
As total sulfur dioxide decrease and alcohol increase, quailty scores appear to increase.
The relationship between density, alcohol, and quality appears to be the strongest. The relationships between residual sugar / alcohol appeared to show as residual sugar decreased and alcohol increased quality increased, but there does not appear to be a strong relationship. A similar result was shown by the plot of total sulfur dioxide and alcohol.
As density and alcohol were most closely correlated with quality and density and alcohol were closely correlated with one another, the strength of the relationship betwen density, alcohol, and quality was not surprising.
print(mean(white_wine$quality))
## [1] 5.877909
The plot “Quality of Wine” contains the quality score data ranging from 1-10 on 4,898 wines. Wine quality is the feature of focus for this study. While the allowed range of scores is 1-10, only scores of 3-9 were given. The mean quality score is 5.878.
Plot two highlights the strongest correlation between two variables: 0.84 between residual sugar and density. This relatsionship is explained by the density of sugar and the contents of wine. “A wine typically contains ethanol (~13%) [and] water (85%)… .” (https://bit.ly/2MXqAXh) The density of water is 1.00. (https://water.usgs.gov/edu/density.html) Because wine is 85% water, the densities of the wines in the sample are near 1.00. An increase of residual sugar (density of 1.56) increases the density of the wine (https://bit.ly/2KNqlkd). This correlation lead to further research showing residual sugar, total sulfur dioxide, and alcohol are dependent upon one another. The amount of alcohol in a wine depends upon the amount of sugar “eaten” by yeast (https://winefolly.com/review/sugar-in-wine-chart/): the more sugar eaten, the more alcohol in the wine, the less residual sugar remains. (https://bit.ly/2K2nFv1) Then as the concentration of alcohol increases, total sulfur dioxide decreases, because less sulfur dioxide needs to be added to prevent microbial growth.(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3472855/) Therefore, a wine with less residual sugar will have more alcohol and a wine with more alcohol will have less total sulfur dioxide, creating a less dense wine.
Plot three highlights the relationship between alcohol, density, and quality. As described in plot two, density and alcohol correlate with one another. These factors also appear to correlate with a higher quality score, as density decreases and alcohol increases (as described in plot two, residual sugar and total sulfur dioxide also decrease), the quality scores increase. ——
The analysis performed examined a sample of white wine which included 4,898 observations of 13 variables. The variable of focus was the wine’s quality score. Initial analysis sought to understand the relationship between quality and all other variables. Unfortunately, there wasn’t a single variable which strongly correlated with quality. Surprisingly, this analysis revealed a stronger relationship between density and three other variables: residual sugar, total sulfur dioxide, and alcohol. Further research revealed the dependent nature of the relatinship of these variables. Quality was then plotted against alcohol and density which showed that quality does correlate with an increase in alcohol concentration and a decrease in density. Further research should seek to isolate the dependent factors within the dataset in order to control for amounts of residual sugar and total sulfur dioxide.